Search CORE

22 research outputs found

Overcoming Exploration in Reinforcement Learning with Demonstrations

Author: Abbeel Pieter
Andrychowicz Marcin
McGrew Bob
Nair Ashvin
Zaremba Wojciech
Publication venue
Publication date: 25/02/2018
Field of study

Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal performance. However, finding a non-zero reward is exponentially more difficult with increasing task horizon or action dimensionality. This puts many real-world tasks out of practical reach of RL methods. In this work, we use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm. Our method, which builds on top of Deep Deterministic Policy Gradients and Hindsight Experience Replay, provides an order of magnitude of speedup over RL on simulated robotics tasks. It is simple to implement and makes only the additional assumption that we can collect a small set of demonstrations. Furthermore, our method is able to solve tasks not solvable by either RL or behavior cloning alone, and often ends up outperforming the demonstrator policy.Comment: 8 pages, ICRA 201

arXiv.org e-Print Archive

Crossref

Domain Randomization and Generative Models for Robotic Grasping

Author: Abbeel Pieter
Andrychowicz Marcin
Biewald Lukas
Duan Rocky
Handa Ankur
Kumar Vikash
McGrew Bob
Schneider Jonas
Tobin Joshua
Welinder Peter
Zaremba Wojciech
Publication venue
Publication date: 03/04/2018
Field of study

Deep learning-based robotic grasping has made significant progress thanks to algorithmic improvements and increased data availability. However, state-of-the-art models are often trained on as few as hundreds or thousands of unique object instances, and as a result generalization can be a challenge. In this work, we explore a novel data generation pipeline for training a deep neural network to perform grasp planning that applies the idea of domain randomization to object synthesis. We generate millions of unique, unrealistic procedurally generated objects, and train a deep neural network to perform grasp planning on these objects. Since the distribution of successful grasps for a given object can be highly multimodal, we propose an autoregressive grasp planning model that maps sensor inputs of a scene to a probability distribution over possible grasps. This model allows us to sample grasps efficiently at test time (or avoid sampling entirely). We evaluate our model architecture and data generation pipeline in simulation and the real world. We find we can achieve a

>

90% success rate on previously unseen realistic objects at test time in simulation despite having only been trained on random objects. We also demonstrate an 80% success rate on real-world grasp attempts despite having only been trained on random simulated objects.Comment: 8 pages, 11 figures. Submitted to 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018

arXiv.org e-Print Archive

Crossref

The Identification of Individuals with Disabilities in National Databases: Creating a Failure to Communicate

Author: Algozzine Bob
McGrew Kevin S
Spiegel Amy N
Thurlow Martha L
Ysseldyke James E
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/1995
Field of study

The purpose of this study was to analyze similarities and differences in how students with disabilities are identified in national databases. National data collection programs in the U.S. Departments of Education, Commerce, Labor, Justice, and Health and Human Services, as well as databases from the National Science Foundation, the American Council of Education, and the College Board, were examined. Nineteen national data collection programs were selected as being potentially useful in the extraction of policy-relevant information on the educational status and performance of students with disabilities. Among these 19 programs there was significant variability in the disability catego-ries used. These programs were targeted for two reasons: (a) their potential usefulness in providing indicators of domains in key models of educational outcomes for children and youth with disabilities, and (b) their prominence in current efforts to monitor progress toward the attainment of national education goals. Discussed are issues related to improving disability identification in large-scale data collection programs and the effects of these issues on reporting policy-relevant information

DigitalCommons@University of Nebraska

Conference report

Author: Bob McGrew
Kevin Leyton-Brown
Ryan Porter
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

The Identification of People With Disabilities in National Databases: A Failure to Communicate (NCEO Synthesis Report)

Author: Algozzine Bob
McGrew Kevin
Spiegel Amy
Thurlow Martha
Ysseldyke James
Publication venue: University of Minnesota, Institute on Community Integration, National Center on Educational Outcomes (NCEO)
Publication date: 01/09/1993
Field of study

A report summarizing findings for policymakers, researchers, and educators that focuses on assessment, accommodations, and accountability in relation to K-12 students with disabilities.The Center is supported through a cooperative agreement with the U.S. Department of Education, Office of Special Education Programs (1990-1995: H159C00004; 1995-2000: H159C50004). Opinions or points of view expressed within this document do not necessarily represent those of the U.S. Department of Education or Offices within it

University of Minnesota Digital Conservancy

Social Compacts in Regional and Global Perspective

Crossref

Learning dexterous in-hand manipulation

Author: Abadi M
Alex Ray
Antonova R
Arthur Petron
Barth-Maron G
Bertsekas DP
Bob McGrew
Bowen Baker
Brockman G
Christiano PF
Dafle NC
Finn C
Glenn Powell
Gupta A
Jakub Pachocki
Jonas Schneider
Josh Tobin
Kalashnikov D
Kingma D
Kumar V
Levine S
Lilian Weng
Maciek Chociej
Mahler J
Marcin A
Matthias Plappert
Mordatch I
Nair V
OpenAI: Marcin Andrychowicz
Peng XB
Peter Welinder
Pinto L
Pinto L
Plappert M
Rafal Józefowicz
Rajeswaran A
Rusu AA
Schulman J
Schulman J
Sutton RS
Szymon Sidor
Tan J
Tobin J
Tobin J
Tzeng E
Wojciech Zaremba
Zhu Y
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref